This is part of the Self Driving Car Nanodegree by Udacity.
The goal is to classify german traffic sign images with a deep learning model.
In this notebook I will give an overview of the data. Then images are normalized, so the minimum value of each image is 0 and the maximum value 255. Also the amount of images is increased by image augmentation techniques. These are changes in brightness (though they are normalized afterwards), rotation, translation, scaling and sheering.
The deep learning model itself has the same architecture as the LeNet-5, but the filter depth got increased as the images are in colour and therfore the input "depth" is bigger. As a result of the larger depth the fully connected layer has more neurons. I also added a dropout layer, as this net seems like overfitting the training examples. Without augmentation it reaches 100% classification on the test set quite early. Even with augmentation the net is overfitting.
This net reached a validation accuracy of 94.1% and a test accuracy of 95.5%. It is overfitting, but im running out of time to tweak the net even further.
Download and extract the file:
import tensorflow as tf
import pandas as pd
import urllib.request
import zipfile
import os.path
from os import makedirs
url = "https://d17h27t6h515a5.cloudfront.net/topher/2017/February/5898cd6f_traffic-signs-data/traffic-signs-data.zip"
local_file = "./data/traffic-signs-data.zip"
if not os.path.exists("./data"):
os.makedirs("data")
if not os.path.exists(local_file):
urllib.request.urlretrieve(url, local_file)
print("Downloaded file")
zip_ref = zipfile.ZipFile(local_file, 'r')
zip_ref.extractall("./data/traffic-signs")
zip_ref.close()
print("Extracted file")
else:
print("File already exists")
# Load pickled data
import pickle
base_data_path = "./data/traffic-signs/"
training_file = base_data_path + "train.p"
validation_file= base_data_path + "test.p"
testing_file = base_data_path + "valid.p"
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(validation_file, mode='rb') as f:
valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']
class_names = pd.read_csv("./signnames.csv")
The pickled data is a dictionary with 4 key/value pairs:
'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).'labels' is a 1D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.'sizes' is a list containing tuples, (width, height) representing the original width and height the image.'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGESimport numpy as np
### Replace each question mark with the appropriate value.
### Use python, pandas or numpy methods rather than hard coding the results
# SOLUTION: Number of training examples
n_train = y_train.shape[0]
# SOLUTION: Number of validation examples
n_validation = y_valid.shape[0]
# SOLUTION: Number of testing examples.
n_test = y_test.shape[0]
# SOLUTION: What's the shape of an traffic sign image?
image_shape = X_train[0].shape
# SOLUTION: How many unique classes/labels there are in the dataset.
n_classes = len(np.unique(y_train))
print("Number of training examples =", n_train)
print("Number of validation examples =", n_validation)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
import matplotlib.pyplot as plt
%matplotlib inline
Lets take a look at the class distributions per set.
import pandas as pd
def class_histogram(labels, title_postfix = ""):
pd.DataFrame(labels).hist(bins = n_classes, grid = False)
plt.title("Distribution of Labels" + title_postfix)
plt.xlabel("Class Number")
class_histogram(y_train, " in Train Data")
class_histogram(y_valid, " in Validation Data")
class_histogram(y_test, " in Test Data")
Class distribution is pretty much the same on each set, though there are more prominent classes than others. To improve results further, one could rebalance the class distribution with augmented images.
Now lets take a look at some images of each class as they are ordered in the dataset.
def plot_classes(images_per_class, labels, images, shuffle_in_class = False):
classes = pd.DataFrame({"Class": (labels)}).groupby("Class")
number_of_classes = len(np.unique(labels))
for group in classes:
figure_number = 1
for column in range(images_per_class):
class_number = group[0]
if shuffle_in_class:
elements = group[1].sample(images_per_class).index
else:
elements = group[1].head(images_per_class).index
plt.subplot(1, images_per_class, figure_number)
plt.imshow(images[elements[column]])
plt.axis('off')
if(figure_number == int((images_per_class + 1) / 2)):
plt.title("Class: " + str(class_names.loc[class_number,"SignName"]) + ", Images: " + str(group[1].shape[0]))
figure_number += 1
plt.show()
plot_classes(5, y_train, X_train)
The class distribution seem to be according to the distribution of the signs you see on german streets. For example class one, a speed limit of 20 km/h is very rare to see, as usally the speed limit in residential areas is 30 km/h. So there are only 180 images of this sign. There are up to 2000 images for the "popular" signs.
As far as I know the images were taken from a car and multiple images of each sign were taken as the car moved by. Therfore the dateset contains the same image with different alterations in the angle or in brightness. Lets take a look at random pictures per class to see some different ones.
plot_classes(5, y_train, X_train, shuffle_in_class=True)
We will also take a look at the images in the validation and test set.
plot_classes(5, y_valid, X_valid, shuffle_in_class=True)
plot_classes(5, y_test, X_test, shuffle_in_class=True)
For preprocessing the images are normalized, so every image uses the full brightness spectrum from value 0 to value 255. This improves images that are very dark, but also adds shifts in colour. One could use better techniques to normalize the brightness, as converting in the HSL colorspace and normalize brighntess there.
Normalizing the images improved the accuracy of the net.
For augmentation a random brightness is added and then the images get normalized. Brightness is added before normalizing, to emulate different conditions when the picture was taken. Also random translations, rotations, sheering and scaling was added. These augmentation techniques are used quite often with image recognition tasks and should improve overfitting.
At last the image values where converted to a -1.0 to 1.0 range for deep learning.
def to_float_image(images):
"""
Converts images in the range 0 - 255 to the range 0.0 - 1.0.
This can still be plotted with matplotlib.
"""
return images.astype(np.float32) / 255.0
This type of normalization changes the colors of the images a lot when the images are dark, as every colorchannel gets multiplied by a constant factor. To improve on this, one could convert the image to the HSL colorspace, adjust the lightnes and convert back to rgb.
def normalize(images):
"""
The pixel values of a normalized image range from 0 to 255.
So if an images darkest pixel is 10 and brightest pixel is 200,
after normalzing they are 0 and 255. All other pixel values are
scaled accordingly.
"""
result = np.empty(images.shape)
for i in range(len(images)):
image = images[i].astype(np.float32)
image -= image.min()
image *= (255.0/image.max())
result[i] = image
return result.astype(np.ubyte)
def float_image_to_deep_learning_input(images):
"""
Input should have values in the range 0.0 - 1.0.
Returns images in the range -1.0 - 1.0.
This can not be plotted with matplotlib.
"""
return (images - 0.5) * 2
from skimage import transform as itrf
def augment_image(image):
"""
Adds random brightnes, shearing, rotation, translation and scaling to the image.
"""
random_brightnes = np.clip(np.random.normal(loc = 1.0, scale = 0.5), 0.1, 3.0)
image = np.clip((image * random_brightnes),0, 255).astype(np.ubyte)
image = normalize(image)
random_shear = np.random.normal(loc = 0.0, scale = 0.1)
random_scale = (np.random.normal(loc = 1.0, scale = 0.1), np.random.normal(loc = 1.0, scale = 0.1))
random_rotation = np.random.normal(loc = 0.0, scale = 0.1)
random_translation = (np.random.normal(loc = 0.0, scale = 0.1), np.random.normal(loc = 0.0, scale = 0.1))
afine_tf = itrf.AffineTransform(shear = random_shear, scale = random_scale, rotation = random_rotation)
# print("Shear: " + str(random_shear) + " Scale: " + str(random_scale) + " Rot: " + str(random_rotation) + " Bright: " + str(random_brightnes))
return itrf.warp(image, inverse_map=afine_tf)
Example of one image augmentation. (Execute multiple times to see the randomness)
plt.imshow(augment_image(X_train[100]))
Here I will show the results of different augmentation steps:
Original Image:
plt.imshow(X_train[100])
Adding random brightness:
random_brightnes = np.clip(np.random.normal(loc = 1.0, scale = 0.5), 0.1, 3.0)
image = np.clip((X_train[100] * random_brightnes),0, 255).astype(np.ubyte)
plt.imshow(image)
Normalizing:
image = normalize(image)
plt.imshow(image)
Random rotation, scale, sheer and translation. This is done at once by multipling the image with a warp matrix.
random_shear = np.random.normal(loc = 0.0, scale = 0.1)
random_scale = (np.random.normal(loc = 1.0, scale = 0.1), np.random.normal(loc = 1.0, scale = 0.1))
random_rotation = np.random.normal(loc = 0.0, scale = 0.1)
random_translation = (np.random.normal(loc = 0.0, scale = 0.1), np.random.normal(loc = 0.0, scale = 0.1))
afine_tf = itrf.AffineTransform(shear = random_shear, scale = random_scale, rotation = random_rotation)
image = itrf.warp(image, inverse_map=afine_tf)
plt.imshow(image)
def image_augmentation_pipeline(images):
result = np.zeros(images.shape)
for i in range(len(images)):
result[i] = augment_image(images[i])
return result
def prepare_images_for_deep_learning(images):
images = normalize(images)
images = to_float_image(images)
images = float_image_to_deep_learning_input(images)
return images
Here the augmentation of images get calculated. I tried to generate the augmented images during the training loop to add randomness and save memory, but that increased the training time significantly, which i did not prefer. I wanted to observe the improvements of the net while training so I could stop training early on bad results. Therefore I needed a fast training pipeline. So the images are generated before training.
%%time
AUGMENTATION_RUNS = 5
X_train_augmented = np.zeros((AUGMENTATION_RUNS,) + X_train.shape)
for i in range(AUGMENTATION_RUNS):
X_train_augmented[i] = image_augmentation_pipeline(X_train)
plot_classes(5, y_train, X_train_augmented[0], shuffle_in_class=False)
X_train_prepared = prepare_images_for_deep_learning(X_train)
X_train_augmented_prepared = prepare_images_for_deep_learning(X_train_augmented)
X_test_prepared = prepare_images_for_deep_learning(X_test)
X_valid_prepared = prepare_images_for_deep_learning(X_valid)
Below is a table of the architecture. It is the same as the LeNet-5 architecture with more depth in the filters and therefore more fully connected nodes.
| Layer | Description |
|---|---|
| Input | 32x32x3 Image Data |
| Convolution 5x5 | Valid Padding, 1x1 Strides, 28x28x16 Output |
| Relu | |
| Max Pool | 2x2 Strides, 14x14x16 Output |
| Convolution 5x5 | Valid Padding, 1x1 Strides, 10x10x32 Output |
| Relu | |
| Max Pool | 2x2 Strides, 5x5x32 Output |
| Flatten | 800 Output |
| Fully Connected | 200 Output |
| Relu | |
| Fully Connected | 100 Output |
| Relu | |
| Dropout | 50% Keep Probability |
| Fully Connected | Output 43 |
I used the same architecture as the LeNet-5 net, as it was build for image classification tasks. Also most of the code got implemented earlier in the nanodegree so it seemed obvious to reuse the code.
To improve the accuracy of the net, I changed the depth of the filters and therefore I had to increase the number of fully connected neurons accordingly. At first I started with quite deep filters, and a lot of neurons. Then I decreased the filters step by step to prevent overfitting until I got good results.
I tried to create an overfitting model first, so I know that the model is able to learn the concepts. Then I tried to decrease overfitting by decreasing filter depths, adding a dropout layer with a 50% keep probability and image augmentation.
I used the adam optimizer with its default learning rate of 0.001. I tried to lower the training rate, but that seemed to increase training time without accuracy improvements. The adam optimizer should addapt the training rate anyways.
I also tried to increase the batch size until I run in memory issues, but that seemed to lower the accuracy somehow. So I kept the batch size at 512. The model ran for 30 Epochs, but it didn't improve after 20.
As a final result I got a validation accuracy of 0.943 and test accuracy of 0.95.
In the loss chart below the training segment you can see that validations loss starts to rise again after about 10 Epochs of training. That shows that the net is still overfitting, but I'm happy with the results it produces anyways.
import tensorflow as tf
from tensorflow.contrib.layers import flatten
def LeNet(x):
# Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
mu = 0
sigma = 0.1
# SOLUTION: Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x16.
conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 3, 16), mean = mu, stddev = sigma), name="conv1_W")
conv1_b = tf.Variable(tf.zeros(16), name="conv1_b")
conv1 = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# SOLUTION: Activation.
conv1 = tf.nn.relu(conv1, name="conv1_relu")
# SOLUTION: Pooling. Input = 28x28x16. Output = 14x14x16.
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID', name="conv1_maxpool")
# SOLUTION: Layer 2: Convolutional. Output = 10x10x32.
conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 16, 32), mean = mu, stddev = sigma), name="conv2_W")
conv2_b = tf.Variable(tf.zeros(32), name="conv2_b")
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# SOLUTION: Activation.
conv2 = tf.nn.relu(conv2, name="conv2_relu")
# SOLUTION: Pooling. Input = 10x10x32. Output = 5x5x32.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID', name="conv2_maxpool")
# SOLUTION: Flatten. Input = 5x5x36. Output = 800.
fc0 = flatten(conv2)
#Layer 3: Fully Connected. Input = 800. Output = 200.
fc1_W = tf.Variable(tf.truncated_normal(shape=(800, 200), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(200))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
# SOLUTION: Activation.
fc1 = tf.nn.relu(fc1)
# SOLUTION: Layer 4: Fully Connected. Input = 200. Output = 100.
fc2_W = tf.Variable(tf.truncated_normal(shape=(200, 100), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(100))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
# SOLUTION: Activation.
fc2 = tf.nn.relu(fc2)
# Dropout
drop = tf.nn.dropout(fc2, keep_prob)
# SOLUTION: Layer 5: Fully Connected. Input = 100. Output = 43.
fc3_W = tf.Variable(tf.truncated_normal(shape=(100, 43), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(43))
logits = tf.matmul(drop, fc3_W) + fc3_b
return logits
A validation set can be used to assess how well the model is performing. A low accuracy on the training and validation sets imply underfitting. A high accuracy on the training set but low accuracy on the validation set implies overfitting.
Variables
x = tf.placeholder(tf.float32, (None, 32, 32, 3), name= "x")
y = tf.placeholder(tf.int32, (None), name= "y")
one_hot_y = tf.one_hot(y, 43, name= "one_hot_encoder")
keep_prob = tf.placeholder(tf.float32)
Training Pipeline
rate = 0.001
logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=rate)
training_operation = optimizer.minimize(loss_operation)
Evaluation
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
total_loss = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
loss = sess.run(loss_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
total_accuracy += (accuracy * len(batch_x))
total_loss += (loss * len(batch_x))
return total_accuracy / num_examples, total_loss / num_examples
Training
from sklearn.utils import shuffle
EPOCHS = 30
BATCH_SIZE = 512
dropout = 0.5
%%time
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train_prepared)
train_losses = []
validation_losses = []
train_accs = []
validation_accs = []
print("Training...")
print()
for i in range(EPOCHS):
# Shuffle normal images.
X_train_shuffled, y_train_shuffled = shuffle(X_train_prepared, y_train)
# Shuffle augmentated images.
for k in range(AUGMENTATION_RUNS):
X_train_augmented_shuffled = np.zeros(X_train_augmented_prepared.shape)
y_train_augmented_shuffled = np.zeros((AUGMENTATION_RUNS,) + y_train.shape)
X_train_augmented_shuffled[k], y_train_augmented_shuffled[k] = shuffle(X_train_augmented_prepared[k], y_train)
# Do Batch runs.
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train_shuffled[offset:end], y_train_shuffled[offset:end]
# do a normal run.
sess.run(training_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout})
# do runs with augmentated images.
for augmentation_run in range(AUGMENTATION_RUNS):
augmented_batch_x, augmented_batch_y = X_train_augmented_shuffled[augmentation_run][offset:end], y_train_augmented_shuffled[augmentation_run][offset:end]
sess.run(training_operation, feed_dict={x: augmented_batch_x, y: augmented_batch_y, keep_prob: dropout})
# Calculate and print losses and accuracies.
train_accuracy, train_loss = evaluate(X_train_prepared, y_train)
validation_accuracy, validation_loss = evaluate(X_valid_prepared, y_valid)
train_losses.append(train_loss)
validation_losses.append(validation_loss)
train_accs.append(train_accuracy)
validation_accs.append(validation_accuracy)
print("EPOCH {} ...".format(i+1))
print("Train Accuracy = {0:.3f}".format(train_accuracy))
print("Validation Accuracy = {0:.3f}".format(validation_accuracy))
print("Validation Loss = " + str(validation_loss))
print()
saver.save(sess, './lenet')
print("Model saved")
pd.DataFrame({"Trainloss": train_losses, "Validationloss": validation_losses}).plot()
pd.DataFrame({"Trainacc": train_accs, "Validationacc": validation_accs}).plot()
Final test
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
test_accuracy, test_loss = evaluate(X_test_prepared, y_test)
print("Test Accuracy = {:.3f}".format(test_accuracy))
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
prediction = tf.argmax(logits,1)
pred = sess.run(prediction, feed_dict = {x: X_test_prepared, keep_prob: 1.0})
from sklearn.metrics import confusion_matrix
def plot_nth_most_confused(n, true_values, predicted_values):
conf = confusion_matrix(true_values, predicted_values)
np.fill_diagonal(conf, 0)
flat = conf.flatten()
flat.sort()
real, pred = np.where(conf == flat[-n])
real_class = np.where(y_test == real)[0][0]
pred_class = np.where(y_test == pred)[0][0]
plt.subplot(1, 2, 1)
plt.imshow((X_test[real_class]))
plt.subplot(1, 2, 2)
plt.imshow((X_test[pred_class]))
plt.suptitle("Real Class: " + str(class_names.loc[real[0], "SignName"]) + "\nPrediction: " + str(class_names.loc[pred[0], "SignName"]))
This are examples for the most confused classes from most confusions to less confusions. The signs are very similiar indeed.
plot_nth_most_confused(1, y_test, pred)
plot_nth_most_confused(2, y_test, pred)
plot_nth_most_confused(3, y_test, pred)
plot_nth_most_confused(4, y_test, pred)
As I life in Germany, I drove around and took some images. I mounted my Phone behind my windshield to take the images. The windscreen was quite dirty, from time to time it was raining. So the images will be challenging. Also there is one 120 km/h speed limit image which was displayed with LEDs and the signs background is black instead of white. Im curious if that one gets recognized by the net.
Here is one example how the image was taken and for the dirty windshield in the corner :)
from scipy import misc
example = misc.imread("images/example.jpg")
plt.imshow(example)
own_images = []
own_images.append(misc.imread("images/12.png"))
own_images.append(misc.imread("images/18-2.png"))
own_images.append(misc.imread("images/18.png"))
own_images.append(misc.imread("images/2.png"))
own_images.append(misc.imread("images/3.png"))
own_images.append(misc.imread("images/8-2.png"))
own_images.append(misc.imread("images/8.png"))
own_images = np.array(own_images)
own_images.shape
own_images = own_images[:, :, :, 0:3]
own_images.shape
own_classes = np.array([12, 18, 18, 2, 3, 8, 8])
for i in range(len(own_images)):
plt.subplot(1, len(own_images), i + 1)
plt.imshow(own_images[i])
plt.axis('off')
prepared = prepare_images_for_deep_learning(own_images)
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
prediction = tf.argmax(logits,1)
pred = sess.run(prediction, feed_dict = {x: prepared, keep_prob: 1.0})
class_names_join = class_names.set_index("ClassId")
pd.DataFrame({"Prediction": pred, "True Class": own_classes})\
.join(class_names_join, on=("Prediction")).rename(columns={"SignName": "SignName Prediction"})\
.join(class_names_join, on=("True Class")).rename(columns={"SignName": "SignName True Class"})
It confused some of the images. I really don't know how it confused class 18 with 17. To confuse 8 with 14 seems reasonable.
def plot_image_and_true_class(image, true_class):
plt.subplot(1, 2, 1 )
plt.imshow(image)
plt.title("Image")
plt.subplot(1, 2, 2 )
plt.title("Prediction")
plt.imshow(X_test[np.where(y_test == true_class)][0])
i = 0;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
The accuracy on these images:
from sklearn.metrics import accuracy_score
accuracy_score(own_classes, pred)
Below you can see the top 5 class propabilities for each of my own image. The model has lower probabilities on "general caution" sign, as there are very similiar signs. Also the led speed limit sign has low probabilities.
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
probs = tf.nn.top_k(tf.nn.softmax(logits), 5)
pred_probs = sess.run(probs, feed_dict = {x: prepared, keep_prob: 1.0})
float_formatter = lambda x: "%.2f" % x
np.set_printoptions(formatter={'float_kind':float_formatter})
for i in range(len(own_classes)):
predicted = pd.DataFrame({"Predicted" : pred_probs[1][i], "Probability": pred_probs[0][i]}).set_index("Predicted").join(class_names)
predicted.plot.bar(x = "SignName", y = "Probability")
plt.title("True Class: " + class_names.loc[own_classes[i], "SignName"])
plt.ylim(0,1)
This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.
Provided for you below is the function code that allows you to get the visualization output of any tensorflow weight layer you want. The inputs to the function should be a stimuli image, one used during training or a new one you provided, and then the tensorflow variable name that represents the layer's state during the training process, for instance if you wanted to see what the LeNet lab's feature maps looked like for it's second convolutional layer you could enter conv2 as the tf_activation variable.
For an example of what feature map outputs look like, check out NVIDIA's results in their paper End-to-End Deep Learning for Self-Driving Cars in the section Visualization of internal CNN State. NVIDIA was able to show that their network's inner weights had high activations to road boundary lines by comparing feature maps from an image with a clear path to one without. Try experimenting with a similar test to show that your trained network's weights are looking for interesting features, whether it's looking at differences in feature maps from images with or without a sign, or even what feature maps look like in a trained network vs a completely untrained one on the same sign image.
Your output should look something like this (above)
### Visualize your network's feature maps here.
### Feel free to use as many code cells as needed.
# image_input: the test image being fed into the network to produce the feature maps
# tf_activation: should be a tf variable name used during your training procedure that represents the calculated state of a specific weight layer
# activation_min/max: can be used to view the activation contrast in more detail, by default matplot sets min and max to the actual min and max values of the output
# plt_num: used to plot out multiple different weight feature map sets on the same block, just extend the plt number for each new feature map entry
def outputFeatureMap(image_input, tf_activation, activation_min=-1, activation_max=-1 ,plt_num=1):
# Here make sure to preprocess your image_input in a way your network expects
# with size, normalization, ect if needed
# image_input =
# Note: x should be the same name as your network's tensorflow data placeholder variable
# If you get an error tf_activation is not defined it may be having trouble accessing the variable from inside a function
activation = tf_activation.eval(session=sess,feed_dict={x : image_input})
featuremaps = activation.shape[3]
plt.figure(plt_num, figsize=(15,15))
for featuremap in range(featuremaps):
plt.subplot(6,8, featuremap+1) # sets the number of feature maps to show on each row and column
plt.title('FeatureMap ' + str(featuremap)) # displays the feature map number
if activation_min != -1 & activation_max != -1:
plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin =activation_min, vmax=activation_max, cmap="gray")
elif activation_max != -1:
plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmax=activation_max, cmap="gray")
elif activation_min !=-1:
plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin=activation_min, cmap="gray")
else:
plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", cmap="gray")
In this activations, you can clearly recognize the number, the round shape and that there is a ring. FeatureMap 8 seems to look for a dark sign in front of a bright background, but the background should not be important.
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
reshaped_image = np.reshape(prepared[3], (1, 32, 32, 3))
outputFeatureMap(reshaped_image, activation)
With this sign it is really hard to recognize the numbers and the shape of the sign.
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
reshaped_image = np.reshape(prepared[4], (1, 32, 32, 3))
outputFeatureMap(reshaped_image, activation)
This is the 120km/h sign displayed with leds, that got confused with another image. I wonder why, as the activation map doesn't seem too different to the other activation maps of speed limit signs.
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
reshaped_image = np.reshape(prepared[5], (1, 32, 32, 3))
outputFeatureMap(reshaped_image, activation)
This is the seconds layer activiation. You can't really see a lot, as the resolution is very low. Maybe I should try to remove the maxpool layers... another time :)
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
activation = tf.get_default_graph().get_tensor_by_name("conv2_relu:0")
reshaped_image = np.reshape(prepared[3], (1, 32, 32, 3))
outputFeatureMap(reshaped_image, activation)